智能论文笔记

Robust Semantic Communications with Masked VQ-VAE Enabled Codebook

Qiyu Hu , Guangyi Zhang , Zhijin Qin , Yunlong Cai , Guanding Yu , Geoffrey Ye Li

分类：机器学习

2022-06-08

尽管语义通信对大量任务表现出令人满意的性能，但语义噪声和系统的鲁棒性的影响尚未得到很好的研究。语义噪声是指预期的语义符号和接收到的语义符号之间的误导性，从而导致任务失败。在本文中，我们首先提出了一个框架，用于稳健的端到端语义通信系统来对抗语义噪声。特别是，我们分析了样品依赖性和样本无关的语义噪声。为了打击语义噪声，开发了具有重量扰动的对抗训练，以在训练数据集中纳入带有语义噪声的样品。然后，我们建议掩盖一部分输入，在该输入中，语义噪声经常出现，并通过噪声相关的掩蔽策略设计蒙版vector量化量化的量化自动编码器（VQ-VAE）。我们使用发射器共享的离散代码簿和接收器用于编码功能表示。为了进一步提高系统鲁棒性，我们开发了一个功能重要性模块（FIM），以抑制与噪声相关和任务无关的功能。因此，发射器只需要在代码簿中传输这些重要的任务相关功能的索引即可。仿真结果表明，所提出的方法可以应用于许多下游任务，并显着提高针对语义噪声的鲁棒性，并显着减少了传输开销。

translated by 谷歌翻译

DDPG-Driven Deep-Unfolding with Adaptive Depth for Channel Estimation with Sparse Bayesian Learning

Qiyu Hu , Shuhan Shi , Yunlong Cai , Guanding Yu

分类：机器学习

2022-01-20

深度无形的神经网络（NNS）受到了极大的关注，因为它们的复杂性相对较低。通常，这些深度折叠的NN仅限于所有输入的固定深度。但是，收敛所需的最佳层随着不同的输入而变化。在本文中，我们首先开发了一个深层确定性策略梯度（DDPG）驱动的深度无折叠的框架，并针对不同输入进行自适应深度，在该框架中，DDPG学习了可训练的深度NN的可训练参数，而不是由随机梯度更新下降算法直接。具体而言，DDPG的状态，动作和状态过渡分别将优化变量，可训练的参数和架构分别设计为DDPG的状态，动作和状态过渡。然后，使用此框架来处理大量多输入多输出系统中的通道估计问题。具体而言，首先，我们通过离网基准制定了通道估计问题，并开发了稀疏的贝叶斯学习（SBL）基于基于的算法来解决它。其次，将基于SBL的算法展开为一组带有一组可训练参数的层结构。第三，采用了提出的DDPG驱动的深度解释框架来基于基于SBL的算法的展开结构来解决此通道估计问题。为了实现自适应深度，我们设计了停止分数以指示何时停止，这是通道重建误差的函数。此外，提出的框架被扩展到实现一般深度神经网络（DNNS）的适应性深度。仿真结果表明，所提出的算法的表现优于固定深度的常规优化算法和DNN，层数量大多。

translated by 谷歌翻译

Online Statistical Inference for Matrix Contextual Bandit

Qiyu Han , Will Wei Sun , Yichen Zhang

分类： (统计)机器学习 | 机器学习

2022-12-21

Contextual bandit has been widely used for sequential decision-making based on the current contextual information and historical feedback data. In modern applications, such context format can be rich and can often be formulated as a matrix. Moreover, while existing bandit algorithms mainly focused on reward-maximization, less attention has been paid to the statistical inference. To fill in these gaps, in this work we consider a matrix contextual bandit framework where the true model parameter is a low-rank matrix, and propose a fully online procedure to simultaneously make sequential decision-making and conduct statistical inference. The low-rank structure of the model parameter and the adaptivity nature of the data collection process makes this difficult: standard low-rank estimators are not fully online and are biased, while existing inference approaches in bandit algorithms fail to account for the low-rankness and are also biased. To address these, we introduce a new online doubly-debiasing inference procedure to simultaneously handle both sources of bias. In theory, we establish the asymptotic normality of the proposed online doubly-debiased estimator and prove the validity of the constructed confidence interval. Our inference results are built upon a newly developed low-rank stochastic gradient descent estimator and its non-asymptotic convergence result, which is also of independent interest.

translated by 谷歌翻译

On the Robustness of Graph Neural Diffusion to Topology Perturbations

Yang Song , Qiyu Kang , Sijie Wang , Zhao Kai , Wee Peng Tay

分类：机器学习

2022-09-16

图形上的神经扩散是一类新型的图形神经网络，最近引起了越来越多的关注。图形神经偏微分方程（PDE）的能力在解决图形神经网络（GNN）的常见障碍方面的能力，例如过度平滑和瓶颈的问题，但尚未对其对逆性攻击的稳健性。在这项工作中，我们探讨了图神经PDE的稳健性。我们从经验上证明，与其他GNN相比，图形神经PDE在本质上对拓扑扰动更为强大。我们通过利用在图形拓扑扰动下利用热半群的稳定性来提供对这一现象的见解。我们讨论了各种图扩散操作员，并将它们与现有的图神经PDE相关联。此外，我们提出了一个一般图形神经PDE框架，可以通过该框架来定义新的强大GNN。我们验证了新模型在多个基准数据集上实现了可比的最新性能。

translated by 谷歌翻译

Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects

Qiyu Dai , Jiyao Zhang , Qiwei Li , Tianhao Wu , Hao Dong , Ziyuan Liu , Ping Tan , He Wang

分类：计算机视觉

2022-08-07

商业深度传感器通常会产生嘈杂和缺失的深度，尤其是在镜面和透明的对象上，这对下游深度或基于点云的任务构成了关键问题。为了减轻此问题，我们提出了一个强大的RGBD融合网络Swindrnet，以进行深度修复。我们进一步提出了域随机增强深度模拟（DREDS）方法，以使用基于物理的渲染模拟主动的立体声深度系统，并生成一个大规模合成数据集，该数据集包含130k Photorealistic RGB图像以及其模拟深度带有现实主义的传感器。为了评估深度恢复方法，我们还策划了一个现实世界中的数据集，即STD，该数据集捕获了30个混乱的场景，这些场景由50个对象组成，具有不同的材料，从透明，透明，弥漫性。实验表明，提议的DREDS数据集桥接了SIM到实地域间隙，因此，经过训练，我们的Swindrnet可以无缝地概括到其他真实的深度数据集，例如。 ClearGrasp，并以实时速度优于深度恢复的竞争方法。我们进一步表明，我们的深度恢复有效地提高了下游任务的性能，包括类别级别的姿势估计和掌握任务。我们的数据和代码可从https://github.com/pku-epic/dreds获得

translated by 谷歌翻译

Unsupervised Monocular Depth Estimation in Highly Complex Environments

Chaoqiang Zhao , Yang Tang , Qiyu Sun

分类：计算机视觉

2021-07-28

随着计算智能算法的发展，由扭曲的光度一致性驱动的无监督的单眼深度和姿势估计框架在白天场景中表现出色。尽管在一些具有挑战性的环境中，例如夜晚和雨天之夜，但由于复杂的照明和反射，基本的光度一致性假设是站不住脚的，因此上述无监督的框架不能直接应用于这些复杂的情况。在本文中，我们研究了高度复杂的情景中无监督的单眼深度估计的问题，并通过采用基于图像传输的域适应框架来解决这个具有挑战性的问题。我们适应了在白天场景中训练的深度模型，适用于夜间场景，并且对特征空间和输出空间的约束促进了框架，以了解深度解码的关键功能。同时，我们进一步解决了不稳定图像转移质量对域适应的影响，并提出了图像适应方法来评估转移图像的质量并重新进行相应的损失，以提高适应深度模型的性能。广泛的实验显示了所提出的无监督框架在估计高度复杂图像的密集深度图方面的有效性。

translated by 谷歌翻译

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

Sucheng Ren , Fangyun Wei , Zheng Zhang , Han Hu

分类：计算机视觉

2023-01-03

Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.

translated by 谷歌翻译

Conservation Tools: The Next Generation of Engineering--Biology Collaborations

Andrew Schulz , Cassie Shriver , Suzanne Stathatos , Benjamin Seleb , Emily Weigel , Young-Hui Chang , M. Saad Bhamla , David Hu , Joseph R. Mendelson III , .

分类：机器学习

2023-01-03

The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.

translated by 谷歌翻译

StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

Yifeng Ma , Suzhen Wang , Zhipeng Hu , Changjie Fan , Tangjie Lv , Yu Ding , Zhidong Deng , Xin Yu

分类：计算机视觉

2023-01-03

Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.

translated by 谷歌翻译

Boosting Neural Networks to Decompile Optimized Binaries

Ying Cao , Ruigang Liang , Kai Chen , Peiwei Hu

分类：机器学习

2023-01-03

Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.

translated by 谷歌翻译